Efficient free energy profile reconstruction 
using adaptive stochastic perturbation protocols 
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Application of Jarzynski nonequilibrium work relation to free energy calculation is limited by the 
very slow convergence of the estimate when dissipation is high. We present a novel perturbation 
protocol able to improve the convergence of Jarzynski estimator when it is applied in the reconstruc- 
tion of the potential of mean force. The improvement is based on the application of the adaptive 
external work variation in addition to the one caused by thermal fluctuations. 
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Mechanical properties of bio-polymers, e.g. proteins 
and nucleic acids, often determine their functioning and 
play a significant role in the interactions they have with 
other biomolecules. Single molecule manipulation exper- 
iments using atomic force microscopy (AFM) 0, Q , op- 
tical tweezers 0| and in ~ silico methods such as steered 
molecular dynamics (SMD) [l], 0, [B[ opened a possibil- 
ity for the analysis of those properties. The mechanical 
resistance measured during a single molecule stretching 
experiment is determined by the molecule's free energy 
profile, also called potential of mean force (PMF). The 
knowledge of this profile is therefore essential for the un- 
derstanding of the biopolymers' mechanical behavior. 

The second law of thermodynamics states that the av- 
erage external work (W) used to perturb a given system 
between two states is always greater than or equal to 
the corresponding free energy difference, namely (W) > 
AF. The equality applies only when the external per- 
turbation is reversible [fj . The direct calculation of free 
energy difference is difficult in the non-reversible cases 
where an average dissipation is significant and unknown. 
That difficulty is additionally pronounced when PMF has 
to be calculated because the behavior of the dissipation 
changes along the reaction coordinate ■ 

In 1997 C. Jarzynski presented a theoretical framework 
able to cope with the problem of free energy calculation 
in the form of the nonequilibrium work relation Q. 
This relation gives a direct connection between the ex- 
ponential average of the external work used to move a 
system between two equilibrium states and the exponen- 
tial value of the corresponding free energy difference 
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The most important property of this equality is that the 
work performed on a given system does not have to be 
reversible. It is satisfied for any perturbation, close to 
equilibrium or far from it [H, [13] ■ 

When an external perturbation is far from equilibrium 
and the number of work samples is limited.the Jarzyn- 
ski based PMF estimate contains a bias 0, The 
bias can be seen in the estimate based on the numer- 
ical Brownian simulation of a single molecule constant 



velocity stretching experiment (Fig. 1) [ill ]. The aver- 
age work based on 2000 trajectories (Fig. 1, curve 2) 
in this experiment is significantly bigger than the under- 
lying PMF (Fig. 1, curve 1) when pulling velocity is 
high. Only a modest improvement can be achieved us- 
ing Jarzynski estimator with the same 2000 trajectories 
(Fig. 1, curve 3). The reason behind the Jarzynski bias 
lies in the fact that nonequilibrium work relation em- 
phasizes rarely occurring work samples with a small or 
negative dissipation [101 ]. 

The problem of the slow convergence of Jarzynski es- 
timator in the case of the normal single molecule 
constant velocity pulling experiments (normal pulling) 
can be overcome by the modification of the perturbation 
protocol. That modification should be able to increase 
the probability of generating work samples with a small 
dissipation. The reduction of the pulling velocity is an 
obvious way to achieve that, but it is often experimen- 
tally difficult or very costly [7j . The broader distribution 
of the external work can produce the same effect. This 
broadening can happen either unintentionally, e.g. due 
to the imperfections of the experimental setup, or it can 
be intentionally introduced through the additional ran- 
dom variation of the external work. The intentionally 
introduced work variation can be applied via symmetri- 
cally distributed random perturbation of the pulling can- 
tilever/spring, i.e. through the external noise with the 
mean value equal to zero. With this kind of the external 
perturbation, the measured work has larger variation but 
the same mean value as the work in the normal pulling 
experiments. Fig. 1 shows two Jarzynski PMF estimate 
(curves 4 & 5) based on the constant velocity pulling 
with the additional external noise. Although calculated 
with the significantly smaller number of samples (both of 
those estimates are averages of 10 reconstructions, each 
based on 20 trajectories) those two estimates have much 
smaller bias due to increased work variation than the es- 
timate coming from the normal pulling (Fig. 1, curve 3). 
In both cases, i.e. in normal pulling and in pulling with 
the additional noise, the exponential work averages along 
the pulling coordinate were calculated using weighted his- 
togram protocol Uj. The external work was calculated 
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using a constant velocity pulling assumption which is, 
obviously, only an approximation. 




extension (A) 

FIG. 1: Original free energy profile compared to the recon- 
structions based on the normal pulling and stochastic pulling, 
(f) Original PMF; (2) (W); (3) Estimate based on normal 
pulling (4) Estimate, const velocity + noise, m—50; (5) Esti- 
mate, const, velocity + noise, m=80. 

The thorough analysis of the influence of the additional 
noise on the behavior of Jarzynski-based PMF estimates 
in the single molecule manipulation experiments can be 
performed using Brownian motion formalism The 
trajectory of the pulled point is described using a dis- 
cretized, one dimensional variant of Langevin equation 
r(r) = r(t - 1) + /3Df(r(t - l))At + a r ■ rj(t) which con- 
nects the position of a pulled point (which corresponds 
to the extension of a molecule in AFM and SMD exper- 
iments) to the force f(r) acting on it and to the ran- 
dom perturbation ay ■ 7/(i)caused by the thermal fluctua- 
tions of the molecule and its environment Q. The force 
f(r) is the first spatial derivative of the time-dependent 
Hamiltonian H(r, t) = AG(r) + k(x(t) - r) 2 /2, which 
describes a system made of a protein with the extension- 
dependent free energy profile AG(r) (Gibbs' free energy 
profile, Fig. 1, curve 1) and harmonic potential with the 
spring coefficient k. The influence of the thermal envi- 
ronment is represented via normally distributed random 
perturbation a r ■ rj(t) with a zero-mean and the variance 
(of) = 2D At. The quantity D is the diffusion coefficient 
[6| and At is the time step. The molecule is extended by 
the movement of the pulling point with the position x{t) 
along the reaction coordinate. In this framework, the 
additional external noise can be introduced via instan- 
taneous stochastic perturbation a x ■ rj(t) of the pulling 
point: x(t) — v ■ t + a x ■ i](t). This approach allows 
calculation of the external work using a constant pulling 
velocity approximation. For simplicity, we used a normal 
distribution to guide the external noise. An additional 
reason for the usage of this distribution is that according 
to the central limit theorem, the cumulative effect of any 
random signal follows Gaussian distribution. 

The work generated by the random movement of the 
pulling point can be treated as a stochastic variable 
and expressed as a function of the instantaneous ran- 



dom variation of the spring extension X and spring con- 
stant k, as W = k ■ X 2 . If fx(x) is the distribution 
function of the stochastic variable X, then the distri- 
bution function of the random work Wn is fw R ( w ) — 
{fxiV^Jk) +f x {-y/wJk))/{2 ■ V^kA When the 
external noise is applied at the every time step of the 
simulation, its deviation can be expressed as a multiple 
of the random deviation of the pulled point using the 
multiplication factor m, a x = m ■ ay = m ■ v 2D At. The 
diffusion coefficient D can be calculated from the slope 
of the average external work (e.g. Eq. 91-92 from [7|) 
but it is not required if one does not want to express a x 
through er r . When a random movement of the pulling 
point is normally distributed, the distribution of the ran- 
dom work fwrt(w) is a chi-square function 

fw R (w) = (a x ■ V2n -k-w)- 1 ■ exp(-(w/k)/2ai), (2) 

with a standard deviation 

o Wr = V3 • k ■ a 2 x = V3 • k ■ m 2 ■ 2D At. (3) 

Jarzynski relation should be able to give free energy 
difference no matter what kind of work distribution 
guides a system between two states, but empirical results 
show that the work distribution properties (variance and 
distribution function) have strong influence on the con- 
vergence of the estimate when the number of samples is 
limited 0, E3] • If work variation (natural or externally 
induced) is greater than the difference between the aver- 
age work and AG, Jarzynski relation may underestimate 
free energy difference (Fig. 1, curves 4 & 5) and if 
work variation is too small, it can not reduce the bias. 
Our analysis shows that for a modest number of samples 
(between 20 and 200 trajectories) the standard deviation 
o w r °f t ne additional random work has to be close to the 
Jarzynski bias based on normal pulling (<Jw n ~ bias) to 
be able to reduce it. In that case, Eq. 3 can be used to 
obtain the noise multiplication factor m needed to attain 
such a work variation 

m ~ y/bias/ (1.73 -k ■ 2D At). (4) 

To obtain the maximum bias along the pulling coordinate 
we applied Eq. 9 from [Io| which connects the maximum 
fluctuation(variation) of the estimate (Fig. 2a, curve 3) 
to its bias <7j = Var(e-P w *-)/0 B N = 2 ■ Mas(N)/f3. 

For a typical SMD setup (k = 28 N/m, D = 
l.OSS-lO- 11 /?^- 1 time step At = 10" 13 s and max- 
imum bias = 20fcsT, the above described procedure es- 
timates the multiplication factor m to be around 28 (a x 
= 0.4A). For a big ger bias, lOOfc^T the same procedure 
estimates m to be 64 (a x = 0.87A). The effect of bias 
reduction is not very sensitive to the exact value of m, 
therefore we applied rounding of the calculated factor m 
to the nearest lower decade. 

The PMF estimates based on the constant velocity 
pulling (0.6 m/s) with external noise shown on Fig. 1 
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are averages of 10 Jarzynski reconstructions each based 
on 20 trajectories. The first estimate (Fig. 1, curve 4) 
was obtained with m = 50 and the second with m = 80 
(Fig. 1, curve 5). 

The additional noise helps in decreasing the overall 
bias of the PMF estimate between the initial and final 
state with a much smaller number of work trajectories 
but generates an underestimate when a random work de- 
viation is greater than the bias. The additional noise 
with the smaller standard deviation (m = 50) decreases 
the overall difference of the estimate and PMF (Fig. 1, 
curve 4) but overestimates A G between two equilibrium 
states. The noise with the greater deviation (m = 80) 
decreases maximum bias much more efficiently but gen- 
erates significant underestimate along the pulling trajec- 
tory (Fig. 1, curve 5). Those results show that a simple 
addition of the external noise can not consistently im- 
prove Jarzynski-based PMF calculation. 

The PMF underestimate can be reduced if the exter- 
nal noise is adapted to the behavior of the bias along 
the reaction coordinate. The behavior of the bias is re- 
flected in the estimate fluctuations along the reaction co- 
ordinate 0, E3] • The difference between the reconstruc- 
tion and its smoothed version gives those fluctuations 
AG n „i se (r) (Fig. 2a). To get a smoothed version of the 
estimate we applied a simple low-pass filtering technique 
using a fifth-order Butterworth filter [15] with the nor- 
malized cutoff frequency 10 for 500 samples along the 
reaction path and 20dB stop band attenuation; the rest 
of the harmonic spectrum has two orders of magnitude 
smaller amplitude and thus belongs to the fluctuations. 
To examine the behavior of the estimate's fluctuations 
we calculated 5 reconstructions per pulling velocity; each 
of them was based on 20 trajectories. The absolute, 
normalized, average version of the estimate fluctuations 
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) and their filtered variant 

Vnoise(i") are shown on Fig. 2b in comparison to the 
original, normalized PMF (pulling velocity 0.6 m/s). 

We developed two adaptive stochastic perturbation 
(ASP) protocols which use position dependent function 
Vnoise(i") to adjust the noise to the bias. The first proto- 
col is amplitude modulation (AM) and it modulates the 
standard deviation of the applied noise. The second pro- 
tocol modulates the frequency of the noise appearance so 
it is named frequency modulation (FM) . 

The amplitude modulation multiplies factor m with 
the current value of the function V no i se {r) as a way to 
improve the free energy reconstruction in a position de- 
pendent fashion. In this case the effective standard de- 
viation of the additional noise a x is not constant but de- 
pends upon the pulled coordinate r(t), a x — ( V no i se (r) ■ 
m) -a r = ( V no i se (r) ■ m) ■ \J 2D At. Dotted curves on Fig. 
3 show reconstructions based on two different pulling ve- 
locities, 0.2 m/s and 0.6 m/s. For both velocities we 
calculated V no i se (r) and m using reconstructions based 
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extension (A) 

FIG. 2: a) Reconstruction, filtered reconstruction and their 
difference, b) Average absolute normalized fluctuations of 5 
reconstructions based on the pulling velocity 0.6 m/s com- 
pared to the normalized PMF. Thick line is V„ ise{r). 



on the normal pulling. The final PMF estimate was cal- 
culated as an ordinary average of 10 reconstructions (200 
trajectories each). The thin profile on each subplot is re- 
construction based on the normal pulling with the same 
number of trajectories (2000). Those results show that 
AM perturbation protocol can decrease bias without a 
significant underestimate. 

The second method used to adapt noise is based on the 
modulation of the frequency of its application (FM). The 
noise in that case is not applied uniformly in time but its 
appearance depends on V no i se (r) via the output of an 
additional random generator applied at the every time 
step of the simulation. This generator produces uniform 
random numbers between and 1; the additional noise 
is applied only if the output of this generator is smaller 
than the current value of V no i se {r). When applying this 
protocol we used the same values of the multiplication 
factor m and the number of trajectories as in AM case. 
The dashed lines on Fig. 3 show the efficiency of the FM 
protocol in improving the Jarzynski estimate. It can be 
clearly seen that this protocol is able to reduce the bias 
with a minimal underestimate. 

Fig. 4 shows the behavior of the estimate's RMSD 
(root mean square deviation, expressed as the percent- 
age of the barrier height) for both modulation protocols 
and for normal pulling. When pulling velocity is 0.2 m/s 
and number of samples is limited (20^200), the bias is 
close to 20fcgT. The corresponding value of m is 28, but 
we used three values of this factor (10, 20 and 30) to test 
the effects of low and high noise. For the faster pulling 
velocity (0.6 m/s) we also conducted experiments with 
three values of m (40, 50 and 60) instead of using an 
exact value (m = 67 for bias ~ HO&sT based on 2000 
trajectories). Fig. 4 shows that the additional noise can 
significantly decrease the bias along the reaction coordi- 
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mally pulled trajectories. 

ASP is different from periodic loading [l6j], reversible 
pulling [l3| and improved sampling strategies based on 



FIG. 3: The PMF estimate based on the normal pulling (thin 
line) in comparison to the estimates based on amplitude (AM 
- dotted line) and frequency (FM - dashed line) modulated 
noise. Thick line is original PMF. 



nate. It also depicts the undesired effects, an increase 
of RMSD coming from the underestimate when the ran- 
dom work is much larger than the average bias, i.e. when 
pulling is slow or number of samples is large enough to 
solely decrease the bias without the additional noise. 




200 2000 

number of trajectories 



FIG. 4: RMSD for three different values of the factor m for 
both noise modulation protocols. Every point is an average 
of 10 reconstructions. 

Both ASP protocols are able to decrease the number 
of samples needed to achieve the acceptable accuracy of 
the reconstruction expressed through RMSD between a 
given PMF and its estimate. For pulling velocity such as 
0.2 m/s, the same RMSD (less than 10 % of the barrier 
height) obtained using 20000 trajectories without an ad- 
ditional noise can be achieved with only 200 trajectories 
and noise modulation. For a faster pulling, 0.6 m/s, the 
improvement is even better because the quality of the re- 
construction obtained with 200 or 2000 trajectories and 
noise modulation can not be achieved with 20000 nor- 
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random jumps from the trial trajectory 
approaches either require a memorizing of the current 
conformation which makes the whole procedure impossi- 
ble with AFM experiments or they require going back- 
ward through the energy landscape, a difficult task to 
perform when proteins are pulled fast .20]. The last 
feature is very important in protein manipulation be- 
cause proteins can not refold instantly, i.e. they need 
much more time to refold spontaneously than to unfold 
mechanically The noise adaptation protocols per- 

form excellent in this aspect because they additionally 
decrease the probability of a sudden unfolding at the be- 
ginning of the perturbation process when polymer, i.e. 
protein is in the folded state. The skewed momenta pro- 
tocol [2l| is similar to ASP but it directly introduces fluc- 
tuation to the pulled point and does not adjust it to the 
bias, and therefore does not avoid bias. Both noise mod- 
ulation protocols can be modified to PMF calculation in 
other types of physical and chemical experiments with 
suitable modulation of the corresponding reaction coor- 
dinates. Our current research is focused on the more effi- 
cient estimation of the noise guiding function V no i se (r). 
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